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ORDERINGS OF WEAKLY CORRELATED RANDOM VARIABLES, 
AND PRIME NUMBER RACES WITH MANY CONTESTANTS 


ADAM J HARPER AND YOUNESS LAMZOURI 

Abstract. We investigate the race between prime numbers in many residue classes modulo 
g, assuming the standard conjectures GRH and LI. 

Among our results we exhibit, for the first time, prime races modulo q with n competitor 
classes where the biases do not dissolve when n,q —>■ oo. We also study the leaders in 
the prime number race, obtaining asymptotic formulae for logarithmic densities when the 
number of competitors can be as large as a power of g, whereas previous methods could 
only allow a power of log g. 

The proofs use harmonic analysis related to the Hardy-Littlewood circle method to con¬ 
trol the average size of correlations in prime number races. They also use various probabilistic 
tools, including an exchangeable pairs version of Stein’s method, normal comparison tools, 
and conditioning arguments. In the process we derive some general results about orderings 
of weakly correlated random variables, which may be of independent interest. 


1. Introduction 

In an 1853 letter to Fuss, Chebyshev noted that on a fine scale there seem to be more 
primes congruent to 3 than to 1 modulo 4. This observation led to the birth of compara¬ 
tive prime number theory, which investigates the discrepancies in the distribution of prime 
numbers. A central problem is the so-called “Shanks-Renyi prime number race” which is 
described by Knapowski and Turan |T3]: let g > 3 and 2 < n < ip{q) be positive integers, 
(where the Euler function (p{q) denotes the number of residue classes mod q that are co¬ 
prime to g), and denote by An{q) the set of ordered n-tuples (oi, 02 ,..., a„) of distinct residue 
classes that are coprime to g. For ( 01 , 02 ,..., o„) G An{q) consider a game with n players 
called “oi” through to “o,^”, where at time x, the player Oj has a score of 7r(x; g, Oj) (where 
7r(x; g, o) denotes the number of primes p < x with p = a mod g). Among the questions that 
Knapowski and Turan asked in [ 13 ] are the following: 

Ql. Will each player take the lead for infinitely many integers x? 

Q2. Will all n! orderings of the players occur for infinitely many integers x? 

It is generally believed that the answer to the stronger question Q2 (and thus to Ql) 
is yes for all g and all (oi, 02 ,..., o„) G An{q)- An old result of Littlewood |T^ shows that 
this is indeed true when (g, 01 , 02 ) = (4,1,3) and (g, 01 , 02 ) = (3,1,2). Since then, this 
problem has been extensively studied by various authors, including Knapowski and Turan 
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[13], Kaczorowski [101 lUl [12], Feuerverger and Martin p], Ford and Konyagin [5], Ford, 
Konyagin and Lamzonri [7], Fiorilli and Martin |1], Fiorilli [3], and Lamzonri [HE]. For 
a complete history as well as recent developments, see the expository papers of Granville 
and Martin [ 8 ], Ford and Konyagin [ 6 ], and Martin and Scarfy [18] (which inclndes a very 
comprehensive list of references). 

Assnming the Generalized Riemann hypothesis GRH and the Linear Independence hy¬ 
pothesis LI (which is the assnmption that the nonnegative imaginary parts of the nontrivial 
zeros of Dirichlet L-fnnctions attached to primitive characters are linearly independent over 
Q), Rnbinstein and Sarnak [22] affirmatively answered qnestions Q1 and Q2. In fact, nnder 
these hypotheses, they established the stronger resnlt that for any (a^,, a„) G An(Q), the 
set of real nnmbers x > 2 snch that 


(1.1) 7 r(x; q, ai) > 7 r(a;; g, 02 ) > • • • > 7t{x; q, an), 

has a positive logarithmic density, which we denote by 5(g; oi,..., a^). (Recall that the 
logarithmic density of a snbset S' of M is dehned as 


lim 


dt 


x^oo logx Jtesn[ 2 ,x] ^ 

provided that this limit exists.) This density can be regarded as the “probability” that for 
each 1 < j < n, the player Uj is at the j-th position in the prime race. 

Among their resnlts on qnestion Q2, Rnbinstein and Sarnak [22] showed that for n hxed. 


(1.2) 


lim max 

q—>oo (ai,...,an)€An(g) 


d(q;ai, ...,an) -r 


n\ 


= 0 . 


Fenerverger and Martin [2] raised the qnestion of having a nniform version of this statement, 
in which the nnmber of contestants n ^ 00 as q ^ 00 . In response to this, Lamzonri 
established that for any integer n snch that 2 < n < \/logg we have 


S{q;ai ,..., a„) = — 1 O 


n\ 


n 


\ogq 


nniformly for all n-tnples (ai,..., a„) G An{q). Fenerverger and Martin [2] also asked whether 
for n snfficiently large in terms of q the asymptotic formnla 6{q-, ai,..., an) ~ 1 /n! might 
become false. A few years ago. Ford and Lamzonri (nnpnblished) developed a henristic 
argnment which snggests that there shonld be a transition in the behavionr of the densities 
when n = (logg)^"*""*^^^. More specihcally, they formnlated the following conjectnre. 


Conjecture 1.1 (Ford and Lamzonri). Let e > 0 be small and q be sufficiently large. 

(1) If 2 < n < (logg)^“^, then uniformly for all n-tuples (ai,..., an) G An(q) we have 

(5(g;ai,..., a„) 1 /n! as g —)■ 00 . 

(2) If (logg)^"*"^ < n < (p{q), then there exist n-tuples (ai,..., a„), ( 61 , ... ,hn) G An{q) 
for which n\ • 5(g; Oi,..., an) 0 and n\ ■ 6{q;bi,... ,bn) ^ 00 as g —)■ 00 . 


Onr hrst resnlt establishes part (1) of this conjectnre. 
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Theorem 1.2. Assume GRH and LI. Let 2 < n < logg/(loglogg)‘^ be a positive integer. 
Then, uniformly for all n-tuples (ai,..., a^) G An{q) we have 


5{q]ai ,... ,an) = -r ( 1 + O 


n\ 


n(logn) 

logg 


The second part of Conjecture 11.11 implies, in particular, that the asymptotic formula 
5(g;ai,..., a„) l/n\ need not hold for all n-tuples (oi,..., a„) G An{q), if n lies in the 
range (logg)^’*'^ < it < T{q)- We believe this to be true because we encounter various error 
terms in our arguments, from different sources, of size about 1/g. Thus we believe that 
when the target density 1/n! becomes much smaller than any negative power of g, which 
happens when n > (logg)^"*"^, it is no longer reasonable to expect all of the 5(g; oi,..., On) 
to be close to 1/n!. Our next result shows that this is indeed the case in the smaller range 
(p(g)^ < n < (p(g). Thus we are able to exhibit, for the first time, prime number races for 
which the biases do not dissolve when g —)■ cxo, confirming the prediction of Feuerverger and 
Martin [2]. 


Theorem 1.3. Assume GRH and LI. Let e > 0 and let q be sufficiently large in terms of e. 
For every integer (p{qY < n < ip{q) there exists an n-tuple (oi,..., a„) G An{q) such that 

5(g;ai,... ,0^) < (l - 

' ^ n! 

for some positive constant which depends only on e. 


We next consider a stronger form of question Ql, concerning the leader in a prime number 
race with many contestants. By the work of Rubinstein and Sarnak, it follows that for any 
(oi,..., On) G An{q), the set of real numbers x >2 such that 

7i{x;q,ai) > max 7r(x;g,aO, 

2<j<n 

has a positive logarithmic density which we denote by (5i(g; oi,..., a„). Kaczorowski [10] has 
considered this leadership question (in the special case Oi = 1), and obtained some positive 
lower density results assuming only GRH rather than LI. One can ask the following natural 
quantitative question: 

Q3. Will each of the players Oi,..., a„ have an “equal chance” 1/u of leading the race, 
when g —)■ ool 

It follows from Theorem 11.21 that the answer to this question is yes, if the number of contes¬ 
tants n lies in the range 2 <n = o(logg/(loglogg)^). Using a different approach we extend 
this significantly, showing that n can be as large as a small power of g. 

Theorem 1.4. Assume GRH and LI. Let 2 < n < be an integer. Then, uniformly 

for all n-tuples (oi,..., a^) G An{q) we have 


5i(g;ai ,..., aA) = - { I + O 


n 


/ n 

\(p(g)i/8 ' (nlogg)^2/25 


+ 
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The key ingredient in the proof of this theorem is the following probabilistic result, which 
may be of independent interest. It investigates the probability that a given random variable 
is the leader among weakly correlated Gaussian random variables. 


Theorem 1.5. Suppose e > 0 is sufficiently small, and n is sufficiently large. Let Xi, ...,X„ 
be mean zero, variance one, jointly normal random variables, and write rij := and 

suppose that \rij\ < e whenever i ^ j■ Then 

mXi > max Xi) - l/n\ < V In d + V |n 

2<i<n 

2<i<n 2<i<ji<ri. 


Note that the probability would be exactly 1/n if the Xi were independent of one another, 
by symmetry. 

We also consider a variant of question Q3 concerning the ordering of the first k contestants 
in a prime race with many competitors. To this end, for each integer 1 < A; < n, we define 
6k{q;ai,..., a„) to be the logarithmic density of the set of real numbers x > 2 such that 


7i{x; q, Oi) > 7i{x-, g, 02 ) > • • • > vr(a:; q, a^) > max 7r(a:; g, a,). 

k+l<j<n 


Note that the cases k = n — 1 and k = n both correspond to the full ordering fll.ip . and 
hence Sn—li^q, Ul, • • • > Q-n) ■ ■ ■ i ®n) ^(5') • • • ) Oin) ■ 

It follows from Theorem 11.21 that in the range l<A;<n — 1< logg/ (log log g)^, we have 


4(g;ai,..., a„) = 


{n — k)\ 


n\ 


1 + 0 


?7,(log?7,)^ 

logg 


Now as n ^ 00 , so the heuristic discussed following Theorem 11.21 leads 

us to expect that 6k{q; ai,..., On) ~ even for very large n, provided roughly that 

k (logg) /logu. We will show that the asymptotic does holds on almost this entire range 
of k, for all n such that (logu)/logg —)■ 0 as g —)■ 00 . Moreover, unlike the case k = 1 
(where Theorem 11.41 permits n to be a small fixed power of g), we show that the condition 
(logn)/ logg —?■ 0 is necessary to guarantee an asymptotic for any k >2. 


Theorem 1.6. Assume GRH and LI. Let 2 < k < n—2 be integers, and suppose (log g)/ logn 
is large enough and k (log kY^ < (logg)/logu. Then, uniformly for all n-tuples (ai,...,a„) G 
An(q) we have 


dk(q^ • • • ) ®n) 


(n — k)\ 


(^1 + 0 



Theorem 1.7. Assume GRH and LI. Let e > 0 and let q be sufficiently large in terms of 
e. Let k >2 be fixed and let n be an integer in the range ‘p(qY < n < </3(g)^^^^. Then there 
exists an n-tuple (ui,..., On) G An(q) such that 

r / \ ^(n-k)\ 

Ok(qi ^1; • • • : ) ^ ( 1 i 5 

' ^ nl 

for some positive constant that depends only on e. 
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Theorem 11.31 can be deduced from Theorem 11.71 as follows. Take fc = 2, let e > 0 be 
suitably small, and suppose hrst that < n < ^p{qY^‘^^ is an integer. Then it follows 

from Theorem 11.71 that there exists an n-tuple (oi,... ,a„) G An{q) such that 

(n - 2)! 

(1.3) 62 {q-,ai,...,an)<(l-Ce)- -j—^ 

' n! 

for some positive constant Cg. Let Sn -2 be the symmetric group on n — 2 elements. Since 
the logarithmic density of the set of real numbers a: > 2 for which 7r(a;; q, a) = 7r(x; q, b) is 0 
(which follows from equation (12.Ih below), we get 

^ 2(^1 Hi? • • • 7 Hn) ^ ^ ^(^; ^ 1 ? ^2'! II(t(3) ; • • • : ^a{nY • 
aGSn—2 

Thus by (ll.3|l . there exists a G Sn -2 for which 

(5(^, CL2') IIcr(3) 7 • • • 7 ^cr(nY ^ (5 

completing the proof of Theorem 11.31 provided n < <p(q)^^^^- 

However, if n is larger then we can set m := \}p{qY^^^\ ~ cind note that by the previous 
discussion there exists an m-tuple (oi,..., a^) G Am{q) for which 

(5(g;ai, ...,am)< (l - c)^. 

ml 

Then if we choose any other coprime residues Um+i, ■■■,an mod q, we have 

^(^7 (Xi, . . . , O^nY) ^ ^ II(t(1)7 II(t(2)7 • • • 7 ^cr{n)Y 

U&Sn- 

<7-l(l)>o-l{2)>...>f7-l{m) 

There are n\/ml terms in the sum, so it follows that for at least one permutation a we must 
have 5(g; a^(i), a<^( 2 ), • • •, a^(n)) < (1 - c)^, as claimed. □ 

Next we shall try to indicate the main ideas in the proofs of our theorems. 

The work of Rubinstein and Sarnak [22j showing the existence of the logarithmic densities 
5(g; Oi,..., ttn) (assuming GRH and LI) in fact shows that 

(1.4) 6{q-, ai,..., a^) = ^{X{q, oi) > X{q, 02 ) > ... > X(g, a„)), 

where each X{q,a) is a sum of the same independent random variables twisted by certain 
arithmetic coefficients depending on q and a. Using a quantitative multivariate form of the 
central limit theorem (which we extract from Stein’s method, replacing direct and somewhat 
messy characteristic function calculations in the previous literature), one can replace the 
X{q,a) by jointly Gaussian random variables with the same means and covariances. Since 
the behaviour of Gaussians is entirely determined by those means and covariances, our task 
is then to obtain as much information as possible about them (on the number theory side), 
and deduce the best results we can on the ordering probabilities (on the probabilistic side). 

Theorem II.21 uses a relatively naive probabilistic treatment, namely a direct estimation of 
the relevant multivariate Gaussian density. The improvement over the result of Lamzouri [H] 
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comes from substantially improved estimates for the average size of the covariances feeding 
into that density (they are typically small, so we are close to a standard multivariate Gauss¬ 
ian). These estimates rely on a harmonic analysis lemma related to the Hardy-Littlewood 
method, which we use to deduce that the differences a* — aj cannot too often be divisible by 
large divisors of q. 

All of our other theorems exploit more sophisticated tools for comparing a multivariate 
Gaussian distribution with the standard multivariate Gaussian, such as the famous compar¬ 
ison lemmas of Slepian (see e.g. Piterbarg izni) and Li and Shao [16]. However, none of these 
tools seem directly able to prove our theorems, because the probabilities of the events we 
are interested in are rather small and the bounds we have on the off-diagonal covariances 
are comparatively large. For example, in the case of Theorem 11.41 we need to show that 

5i(g; ai,..., a„) = P(X(g, oi) > max X{q, aj)) = -(1 -h o(l)), 

2<j<n n 

where potentially 1/n is as small as l/(p(g)^/^^ = l/g^/32+o(i)^ where the largest off- 
diagonal covariances of the X{q,a) (when they are normalised to have variance 1) are x 
1/ logg. To address this problem, we observe that if Zi, Z 2 , ■■■, Zn are independent standard 
normal random variables, then with very high probability we have 

max Zj = V(2-o(l))logn, and P(Zi > ^(2 - o(l)) logn) = ^ . . 

In other words, most of the size 1/n 0 /P(Zi > ma.X 2 <j<n Zj) is determined just by the 
probability of Zi being large enough to possibly be the leader. This means that we can “factor 
out” most of the small size of our target probability 1/n by hrst conditioning on X{q,ai) 
being roughly large enough to be in the lead, leaving a more achievable error bound to be 
obtained from comparison inequalities. Since the X{q,ai) are not really independent of one 
another, this conditioning step itself requires some work and the use of Slepian’s Lemma. 
Theorem 11.51 is a general probabilistic statement that we will prove using these methods. 

Our result on the hrst k places in the race. Theorem 1 1.61 is proved by combining the direct 
density arguments of Theorem ll.2l with the conditioning arguments of Theorem 1 1.41 However, 
since the size of our target probability is now much smaller compared with n the latter part 
of the argument becomes more challenging, and in fact will be the hardest element of this 
paper. In particular, we need to prove a modihed normal comparison lemma incorporating 
within it an application of Slepian’s lemma, and when applying this we exploit the remarkable 
known fact that if the correlations of the random variables X [q, a) have positive sign, then 
they are very small. 

Finally, the negative result Theorem 11.71 is proved by noting that in an event 
X{q,ai) > X{q,a 2 ) > ... > X{q,ak) > max X(g, a^), 

with very high probability the random variables X{q, oi),..., X(g, a^) will have (normalised) 
size a/(2 — 0 ( 1 )) logn, and so in the relevant probability integral there will be terms in 
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the exponential of size x logn. In particular, if there is a correlation between the random 
variables X{q, ai) of size about 1/ logg, this will appear in the density in the exponential and 
will noticeably distort the multivariate normal probability if (log n)/log g isn’t small. So by 
choosing the tuple (oi, ...,an) so that there is a correlation of size x 1 /logg, which is (the 
maximum) possible, we obtain an ordering probability that does not converge to uniformity. 

We end by explaining the organisation of the rest of the paper. In section [2] we explic¬ 
itly state the correspondence between logarithmic densities in prime races and orderings of 
suitable random variables. In section [3] we prove average estimates for the covariances of 
those random variables. Section 0] contains various probabilistic tools tailored to our needs, 
as well as the proof of Theorem 11.51 and the new probabilistic preparations for Theorem 11.61 
Sections [5] and [ 6 ] are relatively short, and contain the deductions of Theorems 11.21 and 11.41 
Finally, section [7] contains the somewhat difficult proof of Theorem 11.61 and section [ 8 ] has 
the proof of our negative result. Theorem 11.71 

We have tried to use notation that will not cause confusion between readers from a 
more number theoretic or a more probabilistic background, but two brief remarks might 
be in order. Firstly, we use Vinogradov’s notation <C, which has the same meaning as the 
“big Oh” notation (thus x x/10, for example). In particular, <C does not mean “much less 
than”. Secondly, if the implicit constant in a statement depends on some ambient parameter, 
we may adorn the notation with that parameter to reflect the dependence (e.g. we might 
write f(x) = O^ix^^), meaning that |/(a;)| < C'(e)a;'^ for some 0 (e)). 


2. Logarithmic densities of prime races and corresponding random 

VARIABLES 


Let Oi,..., be distinct reduced residues modulo g, and dehne 


■■= (^B(x-, q, ai),..., E(x; q, an)^ 


where 


B(x; g, a) 



((p(q)7r(x;q,a) 


7r(x)), 


and 7 r(x) denotes the total number of primes less than x. It turns out that the normalization 
is such that, if we assume GRH, Eg.ai,...,a„{x) varies roughly boundedly as x varies. Notice 
also that 


7r(a;; g, Oi) > 7r(a;; g, 02 ) > ... > 7i{x; g, a„) E{x; g, Oi) > E{x; g, 02 ) > ... > E{x; g, a„). 

For a nontrivial Dirichlet character x modulo g, we denote by { 7 ^} the sequence of 
imaginary parts of the nontrivial zeros of L{s,x)- If we assume LI then all of the non¬ 
negative values of 7 ^ are linearly independent over Q, and in particular are distinct. Let Xo 
denote the principal character modulo g and dehne F = Ux^xo mod g{7x}- Furthermore, let 
{^( 7 x)} 7 xer, 7 x>o be a sequence of independent random variables uniformly distributed on 
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the unit circle. The work of Rubinstein and Sarnak [22] implies, under GRH and LI, that 


for any Lebesgue measurable set S' C M” whose boundary has measure zero, the logarithmic 
density 


lim 

X —^CXD 



[2,A] 

..a„ {x)&S 


dx 

X 


=:h, 




(^) 


exists. Moreover, it follows from their work that 



where Hq-ai,...,an is the probability measure corresponding to the random vector 
{X{q,ai),.. .,X{q,an)), where 


X{q,a) 


— Cqi^a) + ^ ^ Re 

x^xo 

X (mod q) 


2x(a) 

7x>0 
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with Cq{a) := —1 + \{b (mod q) : = a (mod g)}|. Note that for (a, g) = 1 the function 

Cq{a) takes only two values: Cq(a) = —1 if a is a non-square modulo q, and Cq{a) = Cq{l) 
if a is a square modulo q. An elementary argument shows that Cq{a) < d{q) for any 

e > 0, where d{q) = J2m\q i- is fh® usual divisor function. Thus it will turn out that the shifts 
Cq(a) can essentially be ignored when g —)■ cx). 

Let CoVq.ai,...,an be the covariance matrix of (X(g, oi),... ,X(g,a„)). Then a straightfor¬ 
ward computation (see also Lemma 2.1 of [H], for example) shows that 


C0Vq,au...,an{hj) 


Var(g) if i = j 

Bq{ai,aj) if i^j, 


where 

Var(g) := 2 ^ ^ 

XT^XO 7x>0 4 ^X 

X (mod q) 


■Sp T (a) + X it) 

2-^ 1 

X^XO 7x>0 4 ^ 

X (mod g) 


We end this section by recording several basic estimates for the quantities Var(g) and i?g(a, 6) 
that will be useful in our subsequent work. 


Lemma 2.1. Assume GRH. Then for any non-principal character x (mod g), 

(2.2) ^^^«logg. 

7x>0 4 A 

Moreover, we have 

(2.3) Var(g) ~ ip{q) logg as q ^ oo. 


Proof. These estimates follow from Lemma 3.1 of [15], for example. 


□ 
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Lemma 2.2. Assume GRH. For all distinct reduced residues a,b mod q, we have 

(2.4) Bg{a.b) <C (^(g). 

Moreover, if a,b are distinct residue classes such that 1 < |a| < |6| < g/2 and |&|/|a| is 
not a 'prime power, then we have 

(2.5) Bq{a,b) <^\b\{\ogqf. 

On the other hand we have 

(2.6) Bq{l, -1) = -(log2)(y9(g) + O ((logg)^) . 

Finally, ifa,b are distinct residue classes and Bq{a,b) > 0 then 

(2.7) Bq{a,b) logq. 

Proof. The first bound fl2.4p corresponds to Corollary 5.4 of [I5]. The estimates fl2.5l) and 
fl2.6p follow from Proposition 6.1 of [U]. The fact fl2.7p that positive correlations are always 
very small is noted in Remark 5.1 of [15], for example. □ 


3. An average result for the sums of the covariances Bq { ai , aj ) 

3.1. A double average. In view of Lemma [2.21 all of the non-diagonal covariances in our 
prime number race satisfy (when normalised by the variance Var(g)) 

^ ^ • / ■ 

Var(g) logg’ 

This bound is useful, but to obtain strong results we will need to exploit the fact that, if 
we are looking at many residue classes ai,...,a„, the covariances will on average be much 
smaller. This was established in Theorem 5 of Lamzouri na when averaging over all pairs 
of distinct reduced residue classes, but we will need a strong result when averaging only over 
a subset, which requires quite different methods. 


Correlation Estimate 1. Assume GRH. Let q he large and letr,s >1. For any collections 
ai,..., Or and hi, ...,hs of distinct reduced residue classes modulo q, we have 

Bq{aj,bk)\ ^ ^log^(2r5) 

Var(g) logg 

In particular, we have 

\Bq{ai,hk)\ ^ \/glog^(2g) \Bq{aj,ak)\ ^ rlog^(2r) 

Var(g) logg 

Note that this estimate saves roughly a factor of y/rs as compared with a trivial treatment 
using the pointwise bound ^ 1/log Q- 


E E 

^<3<rl<k<. 
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To prove Correlation Estimate 1 we shall need two lemmas. The hrst is the following, 
which will reduce the problem to upper bounding some easier sums. 


Lemma 3.1. Assume GRH. In the setting of Correlation Estimate 1, for any 1 < j < r we 
have 

\Bq{,aj,hk)\ A(g/(g,aj - 6fc)) 








< log^(2s) + ^ 


l</c<s, 


(p{q/{q,aj -hk))' 


where K{n) denotes the von Mangoldt function. 


Proof of Lemma fXil Let x = (glogg)^, and for simplicity of writing set a = Oj. Then it 
follows from Proposition 5.1 of |ll] that 


\Bq{a,bk)\ <^(p{q)il+ | — fefc)/ Mi{q]a,bk) + Mi{q-,bk,a) 


(3.1) 


l£fe<s, (^(q,a-fcfc) j 


+ M 2 {q;a,bk) + M 2 {q-,bk,a) ] +slogg, 


where 


Mi(g;a,d)= 


„-n/x 


n<2x log X 
an=d (mod q) 


n 


, and M 2 (g; a,d) = '^ 


logp 


l<e<2\ogx 
ap^=d mod q/p'- 


pe+u-l(^p_ X)’ 


and p'^ \\ q denotes that is the largest power of p that divides q. 
Now it follows from Lemma 5.3 of HU that 


M, (,; a, 4) = Ihh + o f 1^) < + o ^ 


nk \ q J nk \ q 

where Uk is the least positive residue of bka~^ modulo q. Since (logn)/n is a decreasing 
function for n > 3, we deduce that 




log(nfc) slog^g 


l<fc<S, 


l<k<s, 


nk 


<S 




k<s 


log k log^ q 

k q 


•C log^(2s) + 


s log^ q 


A similar bound holds for bk,a). 

Next we bound the sum ^(M 2 (g; a, bk) + M 2 {q; bk, a)). We have 

logp 


YM2{q]a,bk) = Y 


k=l 


p^\\q e<21oga; 


pi 




(p- 1) 




logp 


e+u—l(.f^ _ 


l<fc<s 

ap'^=bii mod q/p'^ 


p^\\q e<2 logo; 


pi 


(p- 1) 


min{p^, 


since the bk are distinct modulo q. Splitting the outer sum over the primes p dividing q into 
the cases p < s and p > s, we hnd the above is 


■C 


log P 

2-^ 2-^ n^—lf rt 11 ^ 2-^ 2-^ 


logp 


p<s e 


^pe-l(p_ 1) 


p>s e 


^p^(p- 1) 






P<S 


p 


p>s 


logp 

p2 


■C log(2s). 
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A similar bound holds for X]fc=i "^ 2 ( 5 '; bk, a,)- Putting everything together (and remembering 
that s < g, so (s log^ ^ log^(2s)) completes the proof of Lemma ISTTl □ 

To control the sum on the right hand side in Lemma 13.11 we shall deploy the following 
harmonic analysis lemma. This will be the really new aspect of our analysis of the covariances. 


Lemma 3.2 (Following pp 305-307 of Bourgain |1], 1989). Let x he large and let Q > 1. 
Define 


am := J2 

q<Q 


A(g) 

q 


q-l 

^\\9-a/q\\<l/x, 

a=0 


where 11 • 11 denotes distance to the nearest integer, and 1 denotes the indicator function. 

Then if 61 ,..., Or and fii,cfs any real numbers that are 1/x-spaced (i.e. such that 
\\ 0 ri — dr 2 \ \ ^ 1 /^ when rq 7 ^ r 2 , and such that — 4 >S 2 \ \ ^ 1 /^ when si 7 ^ S 2 ), we have 


GiOr - fis) < \^\og^{2QRS) + 

l<r<R, ^ 

l<s<S 


Lemma [3. 2 1 encodes the fact that rationals a/q are well spaced, so it is impossible for lots 
of the points 9^ — fis to be very close to lots of rationals. The proof uses additive characters, 
the problem being analytically nice because we are looking at pairwise differences, which 
corresponds to a convolution on the harmonic analysis side. Since Bourgain’s argument is 
given in a very different context, and since our statement of Lemma [3.21 is also different (in 
particular through the presence of the weights A(g)), we provide a sketch proof of the lemma 
in Appendix [A1 


Proof of Correlation Estimate 1. We may suppose without loss of generality that s > r. In 
view of Lemma l3.ll and the fact that Var(g) ~ p{q) logg that is contained in Lemma l2.ll 
to prove Correlation Estimate 1 it will certainly suffice to prove that 


E E 




A(g/(g,aj 


Y1 

bk)) 


•C \A^log^(2rs). 


Rewriting a little, the double sum is 
A(n 


< 


n\q 


^ ^ ^ ^ ^aj=bk mod q/n 




«E 

n\q 


A(n) 

/ , / , ^aj=bk mod q/n, 

^k 


n 


where 1 denotes the indicator function, and where we used the fact that (^(n) x n if n is a 
prime power (and so A(n) 7 ^ 0). We also observe that we cannot have Oj = bk modulo q/n 
for two values n that are powers of different primes, since in that case we would have Oj = bk 
modulo g, which is false by assumption. Therefore the contribution to the sum from those 
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n > rs is trivially 






log(2rs) 


rs 


< log(2rs), 


which is acceptable. 

To bound the contribution to the sum from those n < rs, we apply Lemma [3.21 with the 
choices Q = rs, x = max{g, (rs)^}, 9j := aj/q and (j)k := hk/q for all j, k. Thus all the points 
9j and (pk are indeed 1/x-spaced, and aj = bk modulo q/n ii and only ii 9j — cpk = u/n for 
some integer u. We deduce that 




aj=bk mod q/n — 


< 




n—\ 


n<rs,n\q 




\\idj-<pk)-u/n\\<l/x 


n<rs 

l<k<s 


u=0 

•C \A^log^(2rs) + 1, 


which is enough to prove Correlation Estimate 1. 


□ 


3.2. Some matrix estimates involving the covariances. We will need some information 
about the determinant and inverse of a covariance matrix that is close to the identity matrix. 
Let Wl„(e) denote the set of all n x n symmetric matrices whose diagonal entries are 1, and 
whose off-diagonal entries have absolute value at most e. 


Lemma 3.3. If e < l/2n then for any A = (oj^k) € Wln(e) we have 

( \ 


det(y4) = 1 + 0 


I I 


i<iA<?i, 
\ j+fc 




In addition, if e < l/2n then A is invertible, and if we let aj^k denote the entries of the 
inverse matrix A~^ then we have 


0‘j,k 


1 T O j e 

l^m 


*/j = k. 


O ( T 'Yhi^j,k ^ 'Pf^l<l,m<n, |n/,m| j 'If 3 ^ k. 

V ’ / 


Lemma [3.31 extends Lemmas 4.1 and 4.2 of Lamzouri |T3], which would give roughly the 
same result if all the off-diagonal entries Oj^k had size about e, but are weaker if the aj^k are 
on average smaller (as will later be the case for us when we take e of size about 1/ logg). 


Proof of Lemma \3.A We have 

det(A) = 1+ X] sgn(a)ai,^(i)...a„,^(„), 

0-£Sn, 

(T^l 

where Sn denotes the symmetric group on n elements. We divide the sum according to the 
number t of points that are not hxed by a permutation a. Thus the only term with t = 0 is 
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the identity permutation, which we removed from the sum; there are no terms with t = 1; 
and the contribution from t = 2 has size at most 

l<j<nl<k<n, i^j,k<n, 

kjtj j^k 

For any 3 < t < n, by averaging the total contribution is at most 

(J^Sn^Cr{j) = k^ 

G has t non-fixed points cr has t non-fixed points 

In the inner sum we have (^Z^) choices of points that are not fixed (in addition to j and k), 
and for any such choice there are at most (t — 1)! ways to construct a such that a{j) = k. 
Thus the total contribution from any 3 < t < n is 

j^k jj^k 

and our claim about det(A) follows on using the assumption e < l/2n and summing over t. 
In particular, we see that det(yl) >1/2 for all A G A^„(e), so A is invertible. 

Next we need to prove the claims about a-j^k- If j = k then we have 

det(Aj,) 

^ det(A) ’ 

where Ajj denotes the matrix A with the j-th row and column removed. In particular we 
have Ajj G A^n_i(e), and so the claim about ajj follows from our determinant results. 

In the off-diagonal case we have 


1 _ 

det(Afcj) 


det(A) 


0(|det(Al,,,)|) = 0 


zn 

lyGBkj i^k 





where now Bkj denotes the set of all bijections from {1, 2,..., n}\{/i:} to {1, 2,..., n}\{j}. We 
again divide the sum according to the number t of points that are not hxed by a, noting that 
since k ^ j the point j is necessarily always a non-hxed point, and the point k will always 
be the image of a non-hxed point. Thus there are no terms with t = 0, and the contribution 
from t = 1 is simply |aj,fc|. When t = 2 the contribution is 



Clj,i 


^i^k 


For any 3 < t < n — 1, each a must have at least t — 2 non-hxed points that are diherent 
from j, and whose image is diherent from k, so by averaging the total contribution is at most 


l<2<n, aGBkj,o-{i)^iJ,k^ 

a has t non-fixed points 


i^kj h^ij^k 


^i,h 


aGtSkj ,cr{i)=h, 
a has t non-fixed points 
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In the inner snm we have choices of points that are not hxed (in addition to j, i and 

h), and for any such choice there are at most (t — 1)! ways to construct a such that a{i) = h. 
Thus the total contribution from any 3<f<n — lis 

l<i,h<n, ^ ^ l<i,h<n, 

i^h i^h 


Proposition 1 follows on using the assumption e < l/2?7, and summing over t. □ 


4. Probabilistic tools and results 

4.1. Passing to the Gaussian case. As described in section [2l the prime number race 
between residue classes a modulo q is associated with random variables of the general shape 

m 

lT„:=5^3?(Q(a)14), 

i=l 

where (Vi)i<i<m is a sequence of independent, mean zero, complex valued random variables 
(depending on q), and where Cj(a) G C are deterministic coefficients. 

In order to access the tools associated with Gaussian random processes, we would like 
a multivariate normal approximation (i.e. multivariate central limit theorem) for the n- 
dimensional random vector W = (lTa^.)i<j<„. This will allow us to replace the Wa^ by 
Gaussian random variables with the same means and covariances. We need an explicit bound 
for the error arising in the normal approximation, which in particular makes clear the de¬ 
pendence on n. There are not too many such results in the literature, and we will deduce a 
suitable result from the work of Reinert and Rollin j21j . 


Lemma 4.1 (Following Theorem 2.1 of Reinert and Rollin [2T], 2009). Let the situation be 
as described above, let A be a finite set, and let Z = {Za)aeA denote a multivariate normal 
random vector with the same mean vector and covariance matrix asW = {Wa)a&A- Assume 
that E|V)|'^ < /m? for all i, for some K > 1. 

Then for any three times differentiable function h : —)■ R we have 


\m{w) -m{z)\ < 


\h\2K^ 


m 


Y \ + f 5^lG(a)| ] , 

a,b&A \ *=1 


i=l \aSA 


where fiZ := sup,^,g^ 11 U and \h\s ;= sup,_,^,g _4 11 | 


93 


Reinert and Rollin’s work develops a multivariate version of Stein’s method of exchange¬ 
able pairs, and applies in far more general situations than above. In Appendix [B] we very 
briefly indicate how to deduce Lemma 14.II from Theorem 2.1 of Reinert and Rollin [2T] (this 
deduction being, by now, a fairly standard calculation). 
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For the special case of prime number races, we centre and normalize the random variables 
X{q,aj) from Section [2] by setting 




X (Zj) -|- Cq(^(lj^ 


V’'^ar(g) V'Var(g) 


^ 2x(ai) X 




X mod q, 
x¥^xo 


l<j<n, 


7x>0 \ A + % 


where the U{'j^) are independent random variables distributed uniformly on the unit circle. 
Then Yi,... ,Yn have mean zero and variance 1. Moreover, we have 


ey-y- = 

^ Var(g) logg’ 




by fl2.3p and fl2.4p . In this case, we obtain the following corollary. 


Lemma 4.2. Let Z = {Zj)i<j<n denote a multivariate normal random vector whose compo¬ 
nents have mean zero, variance one, and correlations 


¥.ZiZ. := WYiY. = 

^ ^ Var(g) 

Then for any three times differentiable function h : ^ M. we have 

n‘^\h\2 + n^\h\3 


\Eh{Y) - Eh{Z)\ < 




Proof of Lemma 14-21 We apply Lemma 14.11 with the sum over 1 < i < m replaced by a sum 
over characters x ¥" Xo mod q {so m = (p{q) — 1), and with Ci{aj) replaced by x(®i) and with 
Vi replaced by 

V := ^ W JPhL, 

The are indeed independent, mean zero random variables. Moreover, observe that 
iXxi)^iXx 2 )Ui'lx 3 )^(Xxa) vanishes unless {xi)X 2 } = {XaWi}) and in that case it is -C 1. 
Therefore we have 

ETC «("e 

Var(g)2 f+ 72 ^ Var(g)2 (p{q)^ 

where the hnal inequalities follow from Lemma 12.11 So we have checked that Lemma 14.11 is 
applicable with K an absolute constant. 

Using the facts that \ci{aj)\ = |x(aj)| < 1 and ffA = n and m = (p{q) — 1, Lemma W?2\ 
now follows immediately from Lemma 14.11 □ 


The above still isn’t quite what we need, since we are interested in the probabilities of 
certain orderings of the Yj (or really of the X{q, aj)), and these correspond to the expectations 
of indicator functions that are not three times differentiable. So we need to approximate 
indicator functions by smooth functions h, with some control on the resulting error in the 
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probabilities. There is a substantial literature that attempts to do this as efficiently as 
possible, but for us a simple approach will suffice. 


Lemma 4.3. Let Y = (h^)i<j<n and Z = {Zj)i<j<n be as in Lemma \4^ Let (5) he any 
subset of {1,..., n} x {1,..., n} not including any diagonal pairs {i, i), and let S be the subset 
o/M” defined by 

S := {(xi,... ,x„) e : Xi > Xj V(i, j) G (S')}. 

Finally, let 6 > 0 be a small parameter. 

Then 


|P(r e S) -P(z e S)| < 


n 


+ 


Proof of Lemma 4-3 Let 0 : R ^ M be a three times differentiable function such that 

r 1 if X > 5 

0(x) = < G [0,1] if 0 < X < h 
[o if X < 0, 

and 0^'^^(x) (1/5)'^ for 1 < d < 3 (where denotes the d-th derivative of 0). Note that 


such 0 exists since the interval on which 0 changes from 0 to 1 has length 6 . 


Now, let h'l ,h^ : R" — >■ R be three times differentiable functions defined by 

hf{xi,...,Xn) ■■= JJ 0(xi-Xj), h|(xi,... ,x„) := JJ ^{xi - Xj + 5). 
Ymis) hj)6(s) 

By repeated application of the product rule we see 


02 


dXadXb 


hg {Xi, . . . ,Xn) = 


E 


0(X„ - X, 


(i,j),{k,l)e{S), 


{u,v)e{s), 




E ( n 


Xy) 


02 


dXadXb 


0(Xi - Xj). 


Here each of the products has absolute value at most 1. The derivatives vanish unless a G 
{i,j} and b G {k,l}, so there are <C non-vanishing terms in the sums, and each term 
contributes <C l/d2 because 0(‘^^(x) <C (l/d)'^. We conclude from this calculation that \hj \2 -C 
rPjdfi. An exactly similar argument shows that \hj \3 -C n^/ 6 ^, and the same for 

Now observe that E,hj{Y) < P(E G S') < E/i^(E). In addition, by Lemmawe have 

,6 


\Ehf{Y)-Ehj{Z)\^ 


n 


and \Ehj{Y)-Ehj{Z)\<^ 


n 


so to prove Lemma IT73I it will suffice to show that |P(Z G S') — Ehj{Z)\ <C n‘^6 and \F{Z G 
S) — Eh'l{Z)\ <C n'^S. But this is fairly easy, because we have for example that 

\F{Z eS)- Ehj{Z)\ < F{\Zi - Zj\ < 6 for some (i, j) G (^)) < ^ F{\Zi - Zfi < 5). 


(m)e(s) 
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And each difference Zi — Zj is a normal random variable with mean zero and variance 

E(Zi - Zj^ = EZ^ + EZj - 2EZiZ, = 2 - ^ 

in view of (12.3p and (I2.4p (and our assumption that if {i,j) G (S') then i ^ j, and therefore 
tti 7 ^ aj). Such a normal random variable has probability 0{6) of lying in any 5-ball, and 
there are at most pairs in (S'), so we have the required bound -C for the right hand 
side. □ 


Finally, by making the optimal choice of 6 in Lemma 14.31 and correcting for the small 
shifts Cq{aj)/ '^Var(g) in the definition of Yj, we obtain the following result that we shall 
actually use. 


Normal Approximation Result 1. Let X 


denote a multivariate normal random vector whose components have mean zero, variance 
one, and correlations EZiZj := ^ • 


Then for any set S as in Lemma 4-3, we have 


|P(X G S) -P(Z G S)| < 


n 




Proof of Normal Approximation Result 1. Choosing 6 = n/ip{qy^^ in Lemma [4.31 is optimal 
and leads to the bound 


71 ^ 

|P(FgS)-P(ZgS)|«^. 


Since we have X = Y 


-^=^)i<j<n, and as described in section [2] we always have 


Cq{aj)/ A/Var(q') ■\/Var(g) <C l/(p{qy^‘^~'' (which is much smaller than 6 = n/(p{qy/^), 

then as well as the inequality Ehf{Y) < P(y G S') < Eh^(y) in the proof of Lemma [4.31 we 
will actually have Ehj{Y) < P(X G S') < Eh^(y), provided the function 0 there is chosen 
suitably (e.g. such that (j){x) = 0 if a: < 5/100, and (p{x) = 1 if a: > 995/100). So we will 
have the same bound for |P(X G S) — E{Z G S)|. □ 


4.2. Normal comparison results. In this subsection we record some celebrated results 
that let one compare probabilities for multivariate Gaussians with different covariance ma¬ 
trices. 

Normal Comparison Result 1 (Slepian’s Lemma, see e.g. Piterbarg [ 20 ]) • Let X = 
(Xj)i<j<„ and W = (hFi)i<j<n 5e vectors of jointly normal, mean zero, variance one ran¬ 
dom variables. Suppose that EXiXj < EWiWj for all pairs i,j. Then for any real numbers 
Ml,..., Un we have 
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Slepian’s Lemma says that decreasing the correlations of normal random variables makes 
them stochastically larger. It allows one to replace complicated correlations with simpler 
ones and it has the great advantage that it is never worse than trivial. 

Using Slepian’s Lemma we can prove the following useful bound. 


Normal Comparison Result 2. Suppose that n >2 and that e > 0 is sujficiently small. 
Let Xi,..., Xn he mean zero, variance one, jointly normal random variables, and suppose that 
KXiXj < e whenever i ^ j ■ Then for any A> 1 and any B > 0 we have 

„-A2/2+0(6A2+AS+R2) \ 

-) 


f max Xj < A) -C exp 

l<i<n 


-0 


n- 


+ e 




A + B 

In particular, for any lOOe < 6 < 1/100 (say) we have 
;max X, < V(2-5) logn) < 


l<i<n 


Proof of Normal Comparison Result 2. In view of Slepian’s Lemma, the probability is at 
most as large as with the X, replaced by Wi, where ElUjhU,- = e whenever i ^ j. However, 
it is well known that one can explicitly construct random variables with this covariance 
structure, by letting Zq, Zi,Zn be independent standard normal random variables, and 
then taking 

Wi = y/eZo + \/l — eZi V 1 < i < n. 


So conditioning on the value of Zq, we deduce that 

P(max Xi<A)< P(max Wi < A) = f (^( ^ e-y^^^dy, 

T<i<n Wn ^ V^J-oc\ KVT^eJJ 

where <h denotes the standard normal cumulative distribution function. 

Splitting the integral at y = Byj2/e, we hnd it is 


(4.1) 


< $ 


' A + By/2\ 


+ 




e y^^'^dy ( 4) 


' A + By/2\ 


+ e 




Moreover, for any z > 1 we have 

4>(z) = 1 - 


,-^2/2 


—= / e y 1‘^dy < 1 - 0(-), 

\/27r Jz z 


and so (since e is small) we deduce 


(4.2) 


$ 


yjl 


< 1-0 


,-(l+2e)(A+RU2)2/2' 

ApB 


from which the hrst bound claimed in Normal Comparison Result 2 follows. 

The second bound follows on taking A = yj{fl — S) logn and B = b\/ (logn)/50 in (14.ip 
and fl4.2p . and noting then that 

g-(l+2e)(A+R^/2)2/2 ^-{l+2e)A^I2-2AB g-(l+5/50)(l-<5/2) logn-(2/5)(5 logn ^5/20 

B ^-> n -----> n - ■== -> . - 

A + B A + B yJ\ogn yjlogn 
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□ 

The above results only give one sided bounds on probabilities, whereas in our theorems 
we want to show probabilities are equal up to a small error. The following result can supply 
such estimates in some cases. 


Normal Comparison Result 3 (See Theorem 2.1 of Li and Shao [H]). Let X = (Xj)i<j<„ 
and W = {Wi)i<i<rL be vectors of jointly normal, mean zero, variance one random variables. 
For each pair i,j define pij := max{|EXjXj|, lElTjlTjl}. Then for any real numbers ui, ...,Un 
we have 


P(Xj < Ui\/ 1 < i < n) — P(lTj < UiM 1 < i < n) 

< — (arcsin(EXjXj) — arcsin(ElTjlTj)) exp 

2ti ^^ 

l< 2 <j'<n, 

^XiXj>¥WiWj 


+ u]) ] 
2(1 + pi,j) j 


Note that by swapping the roles of X and W one can obtain two sided bounds from this 
result. 

A problem with Normal Comparison Result 3 is that the probabilities on the left may 
themselves be very small, and so the bound on the right may be worse than trivial. We know 
of no result that can overcome this difficulty in general, and a major issue in proving our 
theorems will be arranging things so that we only apply Normal Comparison Result 3 to 
probabilities that are fairly large, as discussed in the introduction. 


4.3. A result ou leaders. In this subsection we shall establish Theorem 11.51 To this end, 
we shall first prove the following. 


Lemma 4.4. Let the situation be as in Theorem M . 5[ Then for any x > 1 we have 
(max W < x|Xi =x)- $(a;)”-^| < xe -^^"^ ^ 

2<i<j<n 


2<i<n 


Proof of Lemma \4.4\ Consider the transformed random variables 

T. Aj — Ti iX\ 

y __ z i,z 1 2<i<n. 


1 — r^ 

It is easy to check that these are all standard normal random variables, and they satisfy 
ERWi=0 \/2<i<n, ERW, = = 0{jr,fi + \r^fi\r^fi) = 0{e) Mi^j. 

In particular, since the R are uncorrelated with Xi they are independent from Xi, and so 


we see 


max Xi < x\Xi = t) = P I R < ^^’^ 2 V2 < z < n 1 = P | R < 


2<i<n 


1 — 

^ 1,* 


'1 - ri^i 
1 + 


V2 < z < n 
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Next, using Normal Comparison Result 3 twice (with X and W swapped the second time) 
to compare the V) with independent standard normals we obtain 


Vi < 


1 - ^i,i 
1 + ri^i 


W2 < i < n\ — $ 


Xa 


2<i<n 


1 - 

1 + riA 


oi |Eh^Vj | exp(—x' 

\2<i<j<n 


((1 - n,^)/(l + r^)) + ((1 - rij)/(1 + rij )) . 

2{1+\EVV,\) 


= ole (|ri,i||rij| + InJ) J . 

\ 2<2<j<n / 

Notice here that the contribution from |rjj| is acceptable for Lemma l^~il and the contribution 
from |ri^j||rij| may be rewritten as 


.-^V2(l+0(e)) 


E la.|)^ 

2<i<n / 


If this term is smaller than one then it is acceptable because it is smaller than the hrst error 
term in Lemma 14.41 and if it is bigger than one then so is the hrst error term in Lemma 14.41 
so the lemma is trivially true. 

To hnish it will suffice to prove that 


n * 


Xa 


2<i<n 


1 - ri^i 
1 + 


— ^{x) 


n—1 




2<i<n 


But this follows simply because we have 
- <I>(a;) = 


Xa 


'1 - ri^i 


1 + ri^i 




e = O (^x\ri^i\e 


for each 2 < i < n, on noting that a^\j^ = 1 + 0(|ri_j|) = 1 + 0(e) for all such i. □ 

Proof of Theorem li.51 Lemma 14.41 looks very close to Theorem 11.51 but it doesn’t immedi¬ 
ately yield the theorem because it is quite weak (possibly worse than trivial) unless x is fairly 
large. (As x becomes smaller we expect the probabilities on the left to become very small, 
whereas the bound on the right becomes larger.) Fortunately we can deal with the case of 
small X using Normal Comparison Result 2. 

Indeed, if we choose 6 = 1/1000, say, then we clearly have 


|P(Xi > max Xj)—P(Xi > max Xj, andXi > y/(2 — 5) logn)| < P( max Xi < \/(2 — h) logn), 

2<i<n 2<i<n 2<i<n 

and by Normal Comparison Result 2 the right hand side is 

^ g-e(nl/20000/yi^) ^_l/50000000e ^ ^"100^ 


since e is assumed to be small enough. We would have the same bound if the Xj were 
replaced by independent normals Xj, and in that case we have P(Xi > max 2 <j<„Xj) = 1/n 
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by symmetry, so to prove Theorem 11.51 it will suffice to show that 

|P(Xi > max Xi, andXi > ^/{2-6) logu) - P(Xi > max Xj, andXi > ^/{2^-6yio^)\ 


2<i<n 


2<i<n 


« Y, I’-Ml+n-"” E 




2<i<n 2<2<j'<n 

But using Lemma [4.41 we have 

P(Xi > max Xi, and Xi > '\/(2'^^'5yio^) — P(Xi > max Xi, andXi > \/[2 — 6) logu) 


2<i<n 


,-xy2 


2<i<n 


max Xi < x\Xi = x)dx 


2<i<n 
r*oo p—x^ 12 


' ■yy (2—5) logn \/ 27r 


■^x)^-^dx 


2<i<j<n 


2<i<n 


The Theorem follows on remembering that 6 = 1/1000 and e is sufficiently small. 


□ 


4.4. Preparation for the fc-contestant theorems. In this subsection we shall develop 
two lemmas that will be required later for the proof of Theorem 11.61 We present these in a 
moderate amount of generality, and they might be of wider interest. 

We begin with a kind of hybrid normal comparison inequality. 


Lemma 4.5. Let {Wi)i^x be a finite collection of jointly standard normal random variables, 
let e > Cl > 0 be small, and suppose that 


\WrWj\ < e Vi 7 ^ i, and EWiWj < ei Vi 7 ^ j. 


Then if w > 1, and w/2 < Wi < 2w for all i G X, we have 


P(W < Wi Vi G X) - n-ow 

iez 

< I + 0{e))wi) + j ^ 

\i&X J i,j&X, 

i¥=j 


The key advantage of Lemma 14.51 as opposed to a result like Normal Comparison Result 
3, is the presence of the bracketed prefactor on the right hand side, which can make the 
inequality much more powerful if niex‘^(^0 is small. Notice also that if one has a stronger 
upper bound for EWiWj than an absolute value bound (i.e. if ei is appreciably smaller than 
e), then the second term in the bracket is improved. This should not be too surprising, since 
Slepian’s Lemma (Normal Comparison Result 1) implies that negative correlations make 
normal random variables stochastically smaller. 
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Proof of Lemma 4-5 If one looks inside the proofs of the usual normal comparison lemmas 
(see e.g. Li and Shao’s paper [E]), they yield that 


|P(IL'i < TUi Vz e X) - n I < 


iei 




5^ P,,,\EW,W, 


-{w^^+w'^)/2(l+\EWiWi\) 


,-(1/2+0(.))K2+^2) 




where 


Fij := sup P(iyf^ <Wk^ke X|IPf ^ = Wi, W^'"> = Wj), 

0<h<l 


r(h) 


r{h) 


and the are mean zero, variance one, jointly normal random variables with correlations 

Eipf = hmVkWu k^l. 

(Thus, in particular, when h = 1 the are simply the Wk-, and when h = 0 the are 
independent standard normal random variables.) We will show that 

P(hLfc <Wk^ke X|W = w^, Wj = Wj) < JJ$((1 + 0{e))wi) + e-®«^“')Vhi+^h) 


iex 


for any i ^ j. Exactly the same argument would yield the corresponding estimate for any 
0 < h < 1 (uniformly over h), thus proving Lemma [4.51 
Indeed, if we dehne 


14 := Wk 


(EWfcW - EWkWjEW.Wj),,, (EWkWj - EWkW.EWiWj) 


-W,- 


W, VkEl\{2,j}, 


1 - {EWiWjy 1 - (EWiWj)^ 

then one can check that EVfclW = EI4IEj = 0, and so if fc, / G X\{b j} then 

EVkVi = EVkWi - 0 - 0 = EVkWi = EWkWi - EWkWiEWiWi - EWkWjEWjWi + O(e^). 

In particular, it follows that EVj? = 1 + O(e^). One can also check that when k ^ I we have 
the upper bound 

EVkVi < ei + 2eei + O(e^) < (C*/2)(ei + e^), 

for a suitable positive constant O, since EWkWi < ei and if —EWkWiEWiWi is positive then 
one of the factors must be positive and one negative, so one has size at most ei and the other 
has size at most e. 

Now if we set 14 := ~r=^i then E{Wk < Wk^k E X|ITj = Wi, Wj = Wj) is 


/EV2 ■ 


^ ^ {EWkW, - EWkW,EW,Wj) (EWkW, - EWkW,EW,Wj) ^ . ... 

= P(I 4 <Wfc -^ - -Wi - / - ^-Wj^k eT\{i, 3 \) 


1 - {EWiWjY 
= E{Vk<{l + 0{e))wkVkEl\{t,j}), 


1 - (EWiWj)^ 


where the first equality uses the fact that 14 is uncorrelated with, and therefore independent 
of, Wi and Wj (so we can remove the conditioning), and the second equality uses the fact 
that Wi, Wj ^ w ^ Wk- 








WEAKLY CORRELATED RANDOM VARIABLES AND PRIME NUMBER RACES 


23 


Now the Vfc are jointly standard normal random variables with off-diagonal correlations 
that are < (1 -|- 0(e^))E14Vj < C{e\ -|- e^), so by Slepian’s Lemma (Normal Comparison 
Result 1) the probability is at most as large as if all the off-diagonal correlations were equal 
to C{ei + e^). Arguing as in the proof of Normal Comparison Result 2, it follows that 

P(Rfc<(l + 0(e)KVfceX\{qj}) 

^ Y\ ^ + \/ C{ei -F e^)y 

J-oo \ \/l ~ C(ei e^) 

and Lemma 14751 follows on splitting the integral ai y = ewj^C{t\ -|- e^). □ 

We hnish this section by applying Lemma 14.51 to prove the following general estimate for 
P(maxfc+i<j<„Xj < Xk\Xi = xi, ■■■,Xk = Xk), which is what we shall actually use later when 
proving Theorem 11.61 



Lemma 4.6. Let Xi,...,X„ be mean zero, variance one, jointly normal random variables, 
and write := EXjXj- Let e > Ci, €2 > 0 be sufficiently small, and suppose that 

kijj < e Mi ^ j, and rtj < ei Vi 7 ^ j. 


Let 1 < k < n — 2 be such that ek is sufficiently small, and suppose further that 

k k / k 

E E 

1 = 1 S = l,S^l 




,^l+ <^2 




l<t,u<k, 

t^u 


for any distinct k + 1 < i, j < n. 

Then for any real numbers xi, ...,Xk-i > Xk > I such that ^ 1/^; have 


P( max Xi < Xk\Xi = xi, ...,Xk = Xk) - TT $ (w*) 

k~\~ 1 ^ 2^74 

i=k-\-l 

( 4 . 3 ) < ( f[ ^i{l + 0{e))wi) + e-^i^^^'^^"Ln+e2+M)) 

\i=k-{-l 

k / n 

k-\-l<i<j<n 1=1 \i=k-^l 

for certain numbers Wi that satisfy 

k 

Wi = (1 -h 0{e^k))xk + o(^'^xi\ri^i\j = (^ + 0{\/ek))xk. 

1=1 


X 


The reader may wish to compare Lemma 14.61 with Lemma 14.41 The hrst bracketed term 
on the right hand side of Lemma 14.61 will be crucial when we come to apply the lemma to 
prime number races with k large, since on the relevant range oi Xi, ...,Xk it will turn out that 
nr=.+i $ iwi) is rather small (in fact of size roughly e 
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Proof of Lemma \4 ■ 6\ Note first that provided ek < 1/2, Lemma [3.31 implies that the covari¬ 
ance (sub-)matrix A := is invertible. Let := A~^ denote the inverse 

matrix, and for all fc -|- 1 < i < n and all 1 < Z < fc set 

k k 

Ui,i := and Vi ■.= Xi - '^Ui^iXi. 

S = 1 l = l 

Then the random variables Vi are zero mean, jointly normal random variables, and for any 
1 < t < k they satisfy 

k 

EViXt = ri^t-'^Ui^iri,t = ri^t- ri^sfi,sri,t = ri^t- = 0- 

1=1 l<l,s<k l<s<k 


In other words, the random variables Vi are uncorrelated with, and therefore independent 
from, all of Xi, X 2 ,..., Xj.. Let us also note that for any k + 1 < i < n and any 1 < I < k, 
using Lemma 13.31 to estimate the ri^s yields that 

(4.4) 


k 


( 


\\ 


Ui,i = (l + oU ^ 


\rt,i 




O 


E 


\ri,s 


+ 


\rt,z 


l<t,u<k, 

t^u 


\ sA V 


t^l,s 


l<t,u<k, 

j j 


and using our assumptions that the off-diagonal covariances are bounded by e, and that ek 
is small, it follows in particular that 


(4.5) 


Wi,l\ < \ri,t\ + In,si <e e. 


s=l, 

s^l 


We also see that 

(4.6) EI /.2 = EViXi - 0 = EViXi = 1 + O I e ^ \uif ) = 1 + 0{e‘^k). 


1=1 


In view of the above discussion, if we rewrite the event maxfc+i<j<„ Xi < Xk in terms of 
the Vi so that we can remove the conditioning, we hnd 

max Xi < Xk\Xi = xi, ...,Xk = xA = E \Vi < Xk - xiUii Wk + 1 < i < n 

'k+l<i<n \ ^ 


l=l 


= P {Wi < Wi yk + 1 < i < n) 


where Wi 





and Wi 


A — are mean zero, variance one, iointly 

’ ) J J 


normal random variables. Proceeding to estimate the off-diagonal correlations, we note hrst 
that 


EWiWj = (1 + 0{e^k))EViVj = (1 + 0{e^k))EXiVj, 
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and then from fl4.4p we dednce that 


(4.7) 


EWiWj = (1 + 0(eT))r 




( 


o 


k k 

1 = 1 S = l,s^l 


k 

- (l + o(^e^k + e ^ \rt,u\)) 

t^U 


( I^Ltllo.tl + e^( 

^ l’•‘.“l) 

\ t=l,t^l,S 

l<t,u<k, 

t^u 


\ 


Since \rij\ < e and Vij < ei when i ^ j, and since ek is small, it follows in particnlar that 
whenever i ^ j we have |Ehhjhhj| = 0(e), and 


Ehhjhhj ^ (1 + 0(e^fc))ei + (1 + 0(e^/c^))/ceei + 0 ( 62 ) — 0(ei + 62 ). 

Finally, using that — \J ^ Ym=i ^ \/W^ (which follows from the Cauchy- 

Schwarz inequality) together with fl4.5p we hnd 

k k / k 

Wi (1 “t“ 0(e O^ ^ ^ Xi I + Vek'^ = (1 + 0{e^k))xk + O 5 : xi\ri^i 

1=1 s=i \ 1=1 

Here the hnal equality follows because the xi are all > 1 whereas ek is small. We also 
have ^ V^, and therefore Wi = (l + 0[Ve^)xk, and hence 

Xk/2 < Wi < 2xk for all fc + 1 < i < n. 

This all means that Lemma 1431 is applicable, with ei replaced by 0 ( 61 + 62 ) and w replaced 
by Xk- Lemma [4.61 follows from Lemma [4.51 on noting that 



(1/2 + 0(6)) (w/ + w|) = (1 + 0{Vek))xl, 


and using fl4.7p to get 

^ |Evr.vr,|« ^ 

k-\-l<i<j<n k-\-l<i<j<n 


k k 


hJ 


1 = 1 S = l^s^l 


1=1 


k n 


■c 


E I'-wi + E E i’'«i +mE E 


|0,z| 


k-\-l<i<j<n 1=1 \i=k-\-l 

k / n ^ 

k^l<i<j<n 1=1 Vi^fc+l 


^ ^=1 i=k-\-l 


Here the hnal line uses the Cauchy-Schwarz inequality in the form J2i=k+i l+ll j — 
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5. The full prime number race: Proof of Theorem 11.21 


Suppose q is large, and let Z = {Zj)i<j<n denote a multivariate normal random vector 
whose components have mean zero, variance one, and correlations 

Let S*!, 5*2 C {1, 2,..., n} be non-empty. Then Correlation Estimate 1 implies that 


(5.1) 






eeSi,seS2 

l^S 


V|^l|-|g2|log^ (2|gl| • 1^21) 
logg 


and E \ri,s\ < 


l<^,s<n, 

i^s 


nlog^(2n) 

logg 


Let C = {'f'i,j)i<i,j<n be the covariance matrix of Zi,..., In view of the pointwise bound 
\rij\ <C 1/logg (for i ^ j) from Lemma 12.21 we may apply Lemma 13.31 with e x 1/logg 
provided that n is at most a certain constant times log g, and obtain that 

n(logn)^' 


(5.2) 
and 

(6.3) 


det(C) = 1 -|- O 


(logg)^ 


1 + 0 


r^,s = 


n(log n)^ 
(log*?) 2 


o ( k,,,| + El<& 




(logg)3 


if£ = s 
if £ 7^ s. 


where rjj denote the entries of the inverse matrix C~^. 

Now the key ingredient in the proof of Theorem 11.21 is the following proposition, which 
gives an approximation for the joint density function of Zi,..., Z^. Let /(xi,..., x„) be this 
density, and dehne the Euclidean norm ||x|| := (xf + • • • + x^)^0_ 

Proposition 5.1. Let 2 < n -C logg he an integer. For any x = (xi,... ,x„) G M”, we have 

n(logn)^' 


/(xi,... ,x„) = ( 1 + O 

1 


(logg)^ 


X 


■ exp 


X 


Proof. By dehnition and by fl5.2p we have 

1 


1 + 0 


(logn)"^ n(logn)® ?7,^(logn)^ 

+ /. \o + 


logg 


(log qy 


(logg)^ 


/(xi,... ,x„) = 


exp 


(5.4) 

and using 

(5.5) 


( 27 r )+2 .^(;[et(C) 

n(logn)^ 


x^C-'x 


= 1 + 0 


(log qf 


we obtain 


x^C 'x= ^ 


XiXgrps = IfI 


l<£,s<n 


1 + 0 


n(logn)^ 
(log g)2 


exp ( “2^^^ ) ’ 


XeXsTl^s, 

l<i^s<n 





























WEAKLY CORRELATED RANDOM VARIABLES AND PRIME NUMBER RACES 


27 


and also that 
(5.6) 

l<£^s<n 






jlr sj 


l<£7^s<n 


l<j<n 


(log qf 


\<i^s<n 


Now let J := [2 log n \, and for each 1 < j < J — 1 dehne Sj to be the snbset of {1,..., n} 
consisting of those E for which 11x11/2-^ < \xi\ < ||x||/2-^“^. Also let Sj be the set of those 
^ for which \xi\ < ||a:||/2‘^“^. Then note that for all 1 < J < J we have |S'j| < 2^-^, since 
\Sj\ < n < 2’^-^ and if j < J, then < x\-\ -= IkIP- Using fIS.ip we dednce that 


l<l^s<n 


X 




T, ^ Y1 

i<Lj<a ieSi,seSi 

l^s 


(logn)^ 

< ||x| 


Similarly we have 


\xiXs\ Y < Y 


/ 


Y 


l<e<n, 

V 




/ 


logg 


« E 

l<j<n 




E ^ E I>'6. 


l<i<J 

6 


eeSi, 




/ 


n(logn) 

< NX 


(logg)2 

since ([5ll]) implies that \re,j\ < (V](^log^(2|S'i|))/logg < 2TVlogg. Finally, by the 

^¥=j 

Canchy-Schwarz ineqnality we have X]i<£^s<n hence the contribntion 

of the third term in the right hand side of fl5.6p is <C n^(logn)^| |x| p/(log g)^. Collecting the 
above estimates completes the proof. □ 

Proof of Theorem \1.2[ By Proposition Ih.ip if 2 < n < logg/(loglogg)"^ then 
P(Zi > Z2 > ■ ■ ■ > Zn) = / /(xi, . . . , Xn)dxi • • ■ dXn 

(}ogn)^ 


= (^1 + 0 

= (l + o 


= 1 + 0 


n(logn)^ 

(logg)2 

n(logn)^ 

(logg)2 

n(logn)^ 

logg 


' Xl>--->Xn 
1 

{271)0^ 


exp 


))M 


' Xl>--->Xn 
poo 


1 + 0 


logg 


dxi ■ ■ ■ dXr, 


J-c 


exp ( -- ( 1 + O 


(logn)'^ 

logg 


dt 


1 

n\ 


since the integrand is symmetric in xi,..., Xn- 

Combining this with Normal Approximation Resnlt 1, we obtain 


P(X(g,ai) > X(g, 02) > • • ■ > X(g, 0^)) =1 + 0 


logg 


n 




n\ 


+fe)‘'V' 
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If n < log g/(log log then 1/n! > 1/n” = so the second “big Oh” term may be 

absorbed into the hrst, and Theorem 11.21 follows. □ 


6. The leader: Proof of Theorem 11.41 


We may assume that g is sufficiently large, otherwise the theorem is trivial. Then we 
may assume that n > log°'^ g, say, otherwise the theorem follows from Theorem 11.21 Let 
Z = {Zj)i<j<n denote a multivariate normal random vector whose components have mean 
zero, variance one, and correlations = KZiZj = Bg{ai, aj)/Vax^q). 

In view of Theorem 11.51 and Correlation Estimate 1, and our assumption that n > log°'^ g. 


\F(Zi > max ZA 

2<i<n 


< 

^-100 + ^- 1.99 ^ |+.|+^- 2.99 



2<i<n 

2<i<j<-n 

■c 

-100 Vnhg^n nlog^n 

^ 1.99 logg n^-ooiogg 


< 

log^ n 1 log^ n 


ni.^9 Jog q n log g 



Theorem 11.41 follows by combining this estimate with Normal Approximation Result 1. 


7. Ordering the first k contestants: Proof of Theorem 11.61 

Throughout this section we let 2’i,..., Zn denote mean zero, variance one, jointly normal 
random variables corresponding to a prime number race modulo g (i.e. with off-diagonal 
correlations rjj = FZiZj = i?g(ai, aj)/Var(g)), where g is large. 

Our key tool in proving Theorem II.61 will be the version of Lemma ITHl that arises from spe¬ 
cializing to the prime number race situation. We record this now, together with an estimate 
for the number of large values of the |rj ^j that we shall need when deducing it. 


Lemma 7.1. For any k + 1 < i < n and any j > 1 we have 


# 


1 < I < k : \ri^i 


> 


2t log g 




Proof. Let Sj denote the set of 1 < Z < for which \ri^i\ > 271 ^• Using Correlation Estimate 
1, we have 


leSj 


y^\0g\2ffS,) 

logg 


_iq. c . 

On the other hand, by dehnition of Sj the left hand side is > lemma follows 

by rearranging. □ 


Lemma 7.2. Let k be a positive integer such that Zc/logg is small enough, and suppose 
n > k + 2 is large. Let 1 < A < 2^1ogn be real. If x = ( ti , ..., Xk) is such that Xi, ...,Xk-i > 
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Xk > A and ||x|| < 10\/logg, then we have 
P( max Zi < Xk\Zi = xi,Zk = Xk) = TT $( 

-a 2 f 1+0(-^fc/log q) j n(log nY 


O.. 

logq 

for certain numbers Wi that satisfy 

Wi = \ 1 + O 


Wi 


i=k-\-l 


JJ $ ((1 + 0 (l/logg))wi) + e 


-0(A2{log g)/(log log g)9) 


\i=k+l 


(log qY 


Xk + 0 ^ y^^xilrjf 

1=1 


Proof. We want to apply Lemma 14.61 and Lemma 12.21 shows that we may do so with e x 
1/logg. We also have Bq{a,b) < Clogq (when a 7 ^ 6 ) by fl2.7p . and therefore we may take 
ei X l/(p(g), which is very small. It is more difficnlt to determine a permissible valne of 62 , 
bnt we may do so using Lemma 17.11 together with Correlation Estimate 1. Indeed, we can 


write = Ea< 2 iogfcE«eSu where Sa ■.= {I < I < k 

write = Efe< 2 ioBA:E.er.> where T,, := {1 < s < A; : 


1 


2“ log q 
1 

2** log q 


< lc,z| < 


1 


2“-l log q 
1 

C+l — 2^^ 


}, and 
} (with 


.tb<2logk L- — ' — - 2'^ logq ' rj,^i — log ij - 

suitable adjustments to the dehnitions of Si, S'L 2 iogA:j, TL 2 iogfej to ensure that all indices 
l,s are included). Then Lemma rO implies that ifSa 2^“a'^ and and using 

Correlation Estimate 1 to bound all the sums of |rj ^1 over I G Sa and s G T^, we obtain that 

t t «iog 72 ^)bilpti|!M = 'siptl « 

/ = 1 S = l,S^l 


log^g 


logg 


log g 


log g 


By dividing up the sum over t according to the size of YiY, one can similarly show that 

^ ' ,2^0 7.11_2^07.1 n_1_15 


|r/,t||rs,t| < log( 2 fc) 




log {2k) log {2k) ^ (log log g)- 
log g log g 


log^g 


Putting these estimates together with the standard bounds lc,zl ^ ^ 10 ^^^^^ 


|r’z,s| ^ ^^°og^q^^ 7 (coming from Correlation Estimate 1), one checks that it is 


l^S 

missible to take 


per- 


£2 


(log log qf 
log^q 


whereupon Lemma [4.61 implies that 


( max 

k-\-l<i<n 


Zi < Xk\Zi = xi, ...,Zk = Xk) - <h(wj) 

i=k-\-l 

^ <J) _|_ g-©(£l^(log'?)/(loglog9) 


\i=k+l 



E 


^,J\ 


k+l<i<j<n 


EE 

^ = 1 \ 2 =fc + l 


where tCj = (1 + 0 ( 1 / logg))tCi. 
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Finally, the stated form of the result follows by using Correlation Estimate 1 again to 
show that the second bracket on the right hand side is 

n log^ n nk log^ n n log"^ n 

<C-^-1-^— <C--—. 

logg log^g logg 


□ 


We have now collected all the necessary ingredients for the proof of Theorem 11.61 


Proof of Theorem \1.(A Let c > 0 be a small constant to be hxed later, and suppose hrst 
that cn/logn < k < n. Then our condition fclog^^fc < (log g)/log n implies that n <Cc 
(logg)/(loglogg)^°, in which case Theorem 11.21 is applicable and implies that 

{n-k)\ f f n{\ogny\\ {n - k)\ f f k{logkyiogn\ 

4(g; ai,..., On) = -j— U + O ;- =-j— 1 + 0 -^- 

n\ \ \ logg // n! \ \ logg / 

This result is already acceptable for Theorem 11.61 so we may assume henceforth that we are 
in the other case where 2 < k < cn! \ogn. We may also assume throughout that n > log'^'^ g, 
because otherwise the result again follows from Theorem 11.21 regardless of the value of k. 
Since we assume that g is large, we may assume in particular that n is large. 

First, it follows from Normal Approximation Result 1 together with the discussion in 
section |2] that 



|4(g; ai, • • • ,an) - IP(^1 > ^2 > ••• > > max Zi)| < 


n 


,f(qYir 


k-\-l<i<n 

This error is completely negligible for Theorem 11.61 since we assume k\og^^ k < (logg)/ logn 
and (log g) / log n is large. 

Let 1 < A < 2\/logn be a real parameter to be chosen later. Then we have 


P(Zi > Z2> ... > Zk > max Zi) 

k-\-l<i<n 

= P(Zi > ... > Zk > max Zj, and > A) + O f P f 

k-\-l<i<n Y 


max Zi < A 

k+l<i<n 


and using Normal Comparison Result 2 with e x 1/ log g we get 

g-AV2+0(AV log g+AS+Sh 


P 


max Zj < A -C exp 

fc+l<2<n 


-0 n- 


A + B 


_l_ g-©(B^logg) 


for any R > 0. Simply choosing R = 1, and using that 1 < A < 2^1ogn < 2i/log g and that 
k log^° k < (log g) / log n, we deduce 


(7.1) 


< A ) -C 


Ap2+0(A) ' 


n 


-2k 


Ultimately we will choose A so this whole thing is <C n -C nioA-^g ’ t)e an 

acceptable error term. 
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Next, we have 

(7.2) 

P(Zi > Z 2 > ... > Zk > max Zi, and Zk > A) 

k-\-l<i<n 


' xi>...>Xk>A 


/(ti, ...,a;fc) ■ p( max Zi < Xk\Zi = xi,Zk = xAdxi ■ ■ ■ dxk 

\ k-\-l<i<n / 

/(xi, ...,Xfc) ■ p( max Zi < Xk\Zi = Xi, ...,Zk = Xk]dxi-■ ■ dxk + 

1 \ k-\-l<i<n / 


I xi>...>Xk>A 
\\x\\<3\/k logn 

where as before / denotes the joint density of Zi ,..., Zk, and where the second eqnality follows 
becanse by Proposition 15.11 (and the fact that k logg) we always have f{xi,...,Xk) -C 


(^^exp 

(7.3) 


. Moreover, we note that when ||x|| < 3^/klogn, Proposition 15.11 implies 


f{xi, ...,Xk) = 1 + 






log q J J (27r)^/^ 


exp 


X 


Now let ns note that <h(i/ ± e) = (l + for any ?/, e > 0. In view of 

this, and the crnde bonnd ^ (1/ log q)y/k\J y1’i=i^‘i ^ a//c/ logg, the main term 

nr=fc+i ^ ('^*) Lemma 17^ estimate for P^maxfc+i<j<„ Zj < Xk\Zi = xi, ...,Zk = Xk^ 

eqnals 

(7.4) 


JJ $ ( (1 + 0(l/logg))xfc + of 


i=k-\-l 

n 


JJ $((l+ 0(l/logg)) 


Xk 


i=k-\-l 


1=1 
n 

n 

i=k-\-l^ 


1 + 0 


1=1 


= $( (1 + 0(l/logg))xfc) exp < 


n—k 


0\ e 


i+o(\/fc/iogg) 


E E 

i=fc+l, 1=1 


xi\ri^i\ 


> . 


Now for any given xi > X 2 > ... > Xk > A satisfying ||x|| < 3y/k\ogn, and any non-empty 
snbset S C {k + 1, k + 2,n}, we can divide the snm 0(log(2/c)) pieces 

depending on the size of xi (on dyadic ranges), and apply Correlation Estimate 1 (similarly 
as in the proof of Proposition 5.1) to dednce that 


EEUr,.,! < log(2fc) I |x| I ^(2^#*^) 

ies 1=1 ^ 


•C V(#5)/clogn 


log^(2fc#5) 

logg 


Bnt if iS ;= {fc + 1 < z < n : then the left hand side mnst also be 

— ^ dednce that #5 fc(logn) log® (2fclogn) fc(logn)(loglogg)®. 
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Substituting back this size estimate for S implies that 

^ /c(logn)(loglogg)® 


5^ 

i=k-\-l, 1=1 

Collecting the above estimates shows that 

$ (wi) = $((! + 0(l/logg))a;fc)”"''exp <| O ( e 

i=k^l 


logq 


1 + 0 ( 1 /A:/ log q) j /c(log?7,)(loglogg)® 

logg 


and clearly we have the same estimate for nr=.+i <h ((1 + 0(1/ logq))wi), which appears 
the error term of Lemma 17.21 

At this point we choose A such that 

„0.51A2 


m 


n 


P-5) “ s., ■ 

Note that since we assume that 2 < k < cn/\ogn, this choice of A will satisfy 1 < A < 
2\/logn provided c is hxed small enough. Notice also that if n > log^ g, then fclogn <C 
log q < \/n and so A 3> \/log n. Substituting our choice of A back into (I7.1|l yields 

30.01A2+0(A) 


(7.6) 


IP ( max + < A 1 exp 1 -0 

\k-\-l<i<n 


-k\ogn\ ) + n -C n 


A 


provided c was chosen small so A is large enough. With this choice of A we also have 


= e 


-a2 (i+o(i/fc/iogg) j n(logn)^ -a2 ( 0.49+0 (-/fc/ log g) j kilognY ^ A:(log log 


n 


log q log q log q 

and similarly (distinguishing cases according as k > ^/n > log°'°® q or not) that 

M+o(-/fc/iogg) j A:(logn)(loglogg)® fc(logn)(log fc)® 




< 1 . 


log q log q 

Substituting into Lemma [721 and using our previous computations of HiLfc+i ('^0 

assumption that k (logg)/(loglogg)^°, yields that 

(7.7) 

k log n log® k 


P( max Zi<Xk\Zi = xi,...,Zk = Xk) = ^{{l+0{-^))xkY^(l + 0\- 
A+i<i<n logg ^ \ \ 


logg 


+0(n 


Now, in view of the estimates fl7.3p and fl7.7p . the main term on the right hand side of 
(I7.2p equals 

^ ^ e~\\^\\^/‘^^[{l + 0{l/\ogq))xkT~''dxi...dxk 


lxi>---> Xk>A, (27r)^A 
||a:||<3\/fclogn 




(27r)^A 


e ((1 + 0(1/logg))a;fc)"' ^ dxi...dxk + 0{n ^^). 
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Thus Theorem 11.61 will certainly follow if we show that 

.g-lhlh/2^ ((1 + 0(1/ \ogq))xkY~^ dxi...dxk = ( 1 + O 


k\ogn\\ {n — k)\ 
\ogq 


n\ 


Jxi> ->xk>A (27r)^0 

We shall only prove an upper bound, since an exactly similar argument would give a matching 
lower bound. Indeed, for a certain absolute constant O > 0 the integral is 

1 


< 


f xi>--->xic>A 


(27j-)fc/2 


((1 + O/logg)a:fc)"' ^dxi---dxk 


'Xk>A 


e-^fc/2 


$ ((1 + C/ logg)a;fc) 


n—k 


{k-i)\ 


' X>Xk 


\/^ 


k-l 


■dx 


dxk, 


by symmetry of the integration variables xi,, Xk-i. Making a substitution shows this is 


= 1 + 0 


logg 


,-(1+0(1/log g))+/2 


\n—k 


'^k>{l+T^)A 


X 




ixkY 

„-+/2 


k-l 


= 1 + 0 


logg 


(Al 1)! \yj x>(l+C/logq)~^X). \/27r 
(xk) 


■dx 


dxi 


e *1/2 






{k-l)\ 


f X>Xk 


g -+/2 


k-l 


■dx 


dxk, 


and here the term is = 1 + provided Xk < 3\/logn, say. 

Moreover (and as seen before), the contribution to the integral from the complementary 
range Xk > 3\/logn is -C since in this case we have ||a:|| > 3^/k\ogn. Hence, our 

integral equals 

k-l 


1 + 0 


= 1 + 0 


k logn 
logg 

k logn 


r 

'-k>(l+^,)A ^ 


{Xk) 


n—k 


,-+/2 


{k - 1)! \Jx>x^ 


■dx dxk + 0(n ) 


= 1 + 0 


logg 

k logn 
logg 






-e (xfc)"' ^ dxi - ■ ■ dxk + 0{n 


[ Zi> Z 2 > ... > Zk> max Zj, and > (1 + --)H ) + 0(n 

' fc+l<i<n logg / 


-2k 


where the Zi are independent standard normal random variables. Furthermore, the same 
argument leading to (17.611 shows that 


-2k 


P ( max Zi < (1 + , ^ )y4 I -C n 
yfc+l<i<n log g J 

Finally, the asymptotic (17.8p follows from combining the above estimates, and using that the 
probability P(Zi > Z 2 > ... > Zk > maxk+i<i<n Zi) equals (n — k)\/n\ by symmetry of the 
random variables Zi, Z 2 , 


, Zn- 


□ 
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8. Irregularities in the densities: Proof of Theorem 11.71 

As in the previous section, given distinct reduced residues oi,..., a„ mod q we let Z = 
{Zj)i<j<n be a multivariate normal random vector whose components have mean zero, vari¬ 
ance one, and correlations Vij = E,ZiZj := ^Yar{q) ^ • ^^so, we let 2 < A; < n be a hxed 
positive integer and let f{xi, ..., Xk) denote the density function of Zi, ..., Z^. We denote 
by C = the covariance matrix of Zi,..., Zk (which is certainly invertible provided 

q is large enough in terms of k), and let Vij denote the entries of the inverse matrix C~^. 

The key ingredient in the proof of Theorem 11.71 is the following result. 


Lemma 8.1. Let k >2 he fixed, and let q be large enough in terms of k. There exist distinct 
reduced residues ai,... ,ak modulo q such that 


f{xi,X2, ...,Xk) 





log 2 -h o(l) 

- XiX2 -;- 

logg 


+ Ok 



where the o(l) term tends to zero as q ^ oo. 




Proof. Let pi < p 2 be the smallest prime numbers such that {piP 2 ,q) = 1- Then one has 
Pi < P 2 < 21ogg in view of the fact that Y\.p<zP ~ e^b+o(i))^ which follows from the prime 
number theorem. Mimicking a construction used in Theorem 2 of Lamzouri Ha. we let 
Oi = 1, 02 = —1 and Oj = (piP 2 )'^ for 3 < j < k. Then by Lemma YL2\ and equation fl2.3|) . it 
follows that 


( 8 . 1 ) 


Li,2 = L2,1 


(log 2 ) + o(l) 
logg 


and 

(8.2) 


’■ij < 


(2 logg)^^"*"^ 

<p(g) 


for all i i such that {i,i} 7 ^ {1, 2}. Now, by f|5.4|) we have 


f{.Xi,.. 






Recall that rj j 


1 -I- Ofc(l/(logg)^) by f|5.3p . Furthermore, by fl5.3|) and fl8.2l) we have 




1 

(log g) 2 ’ 


for all i j such that {f, j} 7 ^ {1, 2}. This implies 


f{xi, ...,Xk) 




\x\ 


^ 2,1 + Li ,2 ^ 
XlX2^— -^ + Ok 



(8.3) 
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Now, let A be the matrix obtained from C by removing the hrst row and the second 
column. Then arguing as in the proof of Lemma 13.31 we obtain 


det(^) 

det(C) 




^r2,i + Ok 


( (21ogg)2^+i\\ 

V 


Thus by fl8.ll) we obtain r 2 ,i = ? and a similar estimate holds for ri^ 2 - Inserting these 

estimates in fl8.3p completes the proof. □ 


Proof of Theorem [13 By the same argument that was used in the Introduction to deduce 
Theorem 11.31 from Theorem 11.71 it will suffice to prove Theorem 11.71 in the case k = 2, 
since this implies the result for larger k. (Using Lemma 18.11 one could in fact prove the 
result directly for any fixed k, but this would be more complicated and require an unwanted 
additional assumption of the shape n < at the end.) 

Let ai = 1 and 02 = —1 be as in Lemma l8Tl and let 03,..., a„ be distinct reduced 
residues modulo q that are different from 01 , 02 . As in the proof of Theorem 11.61 Normal 
Approximation Result 1 implies that 


\52{q] Oi, . . . , On) - P(Zi > Z2 


> max 




Next, we have 


P(Zi > Z 2 > max Zi) 

3<i<n 

= P(Z’i > Z 2 > max Zi, and Z 2 > \/l.99logn) + 0(P(max Zi < ^1.99logn)), 

3<i<n 3<i<n 


and using Normal Comparison Result 2 we obtain 


(8.4) 


I max 
3<i<n 


Zi < Vl.99logn) < e-""' + 


n 


-4 


for some positive constants Ci,C 2 . Moreover, similarly to fl7.2p we derive 
P(Zi > Z 2 > Zi, and Z 2 > a/ 1.99logn) 

(8.5) f 


3<i<n 


Ixi>X2>\/1.99 logn 
||a;||<3\/21ogn 


f{xi, X 2 ) ■ P( max Zi < X 2 \Zi = xi, Z 2 = X 2 ]dxidx 2 + O in 

\ 3<i<n / 


Next, it follows from Lemma fO that for all x = ixi,X 2 ) such that a:i > X 2 > \/1.99logn 
and ||a;|| < 3\/2logn we have 


P(max Zi < X 2 \Zi = xi, Z 2 = X 2 ) = Y\ ^ {wi) 

3<i<n 

i=3 


o 


77,9/10 log q 


JJ d) ((1 + 0(1/logg))n;i) + 


n 


-4 


vj=3 


( 8 . 6 ) 
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where Wj = (1 + 0(1/ log^ <i))x 2 + 0(||a:|| X]s=i lo,s|)- Then, similarly to fl7.4p one gets 


$ {wi) = ]^ $ ((1 + 0(1/ log^ q))x2) ll + O 
i=3 i=3 V 

= $ ((1 + 0(1/ log2 q))x2y~" (l + O (- 


n 2 


X • e 


-(l/2+0(-^l/logg))a;| ^ ^ | 


2=3 s = l 


log 

by Correlation Estimate 1 and the fact that X 2 > \/1.99logn. Moreover, in the same range 
for the Xi we deduce from Lemma 18.11 that 


f{Xi,X2) =1 + 0 


( 


n 


\(\ogqyj J (27r)20 

< ( 1 _ ±e-lhllV2 


exp 


\x 


P log2 + o(l) 

- X 1 X 2 


\ogq 


logg / 2n 

for some positive constant c > 0, provided g is large enough. 

Combining the above estimates, and using our assumption that n > ^{qY, we deduce 
that the main term in fIS.Sp equals 


Ixi>X2>\/1.99 logn 
||a;||<3-v/21ogn 


f{Xi,X 2 ) ■ n <h {wi) dxidx 2 


< (1 - o) 


i=3 

1 


la;i>X2>\/1.99logn ^TT 
||a:||<3-v/21ogn 


e ((1 + 0(1/log^ g))a; 2 )” dxidx 2 . 


for some positive constant q. Furthermore, using an exactly similar argument as in the proof 
of fl7.8p . together with fl8.4l) . we derive 


|xi>a;2>\/1.99logn 27r 
||a;||<3\/2 logn 


e (^(1 + 0(1/log^ q))x 2 )^ dxidx 2 


= 1 + 0 


logn \ \ (n — 2)! 


log^g 


n! 


= 1 + 0 


logg 


(n-2)! 


n! 


Finally, the total contribution of the various error terms (notably the one from Normal 
Approximation Result 1 and the one from (18.6p ) is 


•C 


n 




g?(g)^/® ' ' n°-^logg n! 

Recalling our assumption that +(g)'^ < n < +(g)^'^^^, we see the error is negligible compared 
with , which completes the proof. 

□ 


Appendix A. Sketch proof of Lemma 13.21 

In this appendix we sketch the proof of Lemma 13.21 the harmonic analysis lemma that 
we used to prove Correlation Estimate 1. Lemma 13.21 is inspired by a result in work of 
Bourgain [1] (in a substantially different context), but differs in that it is concerned with 
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two sets of points 6r and 0^ rather than one, and it involves a weight A{q)/q rather than 1/q. 
It turns out that the former adaptation is easy, and the latter one simplihes and strengthens 
the argument (since all work with divisor functions becomes much easier). For the sake of 
completeness, we provide a fairly full sketch proof of Lemma 13.21 here. 

Throughout we let 1 denote the indicator function, and let || • || denote distance mod 
1. Recall that we are given two sets 9i,...,9ji and 0i,...,05 of l/x-spaced real numbers. 
Corresponding to these, let J^, : M/Z —)■ [0,10] be bounded variation continuous functions 

on the real numbers mod 1 , that satisfy 


^dit) ^ ^ ^ '^\\t-9r\\<i/x and ^ 'y ^ l||t_(_^g)||<i/a;, 

l<r<R l<s<S 

(note the negative signs attached to the 0 ^ here), and also 


Ie{t)dt < 


lOOOR 


X 


"1 locos' 

I^{t)dt < -, supp(Je), supp(j 0 ) C [-x,x\, 


X 


where leik) := lQ{t)e~‘^'^^*’^dt and I^yik) (for fc G Z) denote the Fourier transforms of J^, I^. 

It is a standard fact that one can construct such functions / (e.g. as a sum of Beurling- 
Selberg type smooth functions that approximate l||t-6»,.||<i/a: and l||j_(_<^^)||<i/a;), and we see 
immediately that the convolution 


:= / Ie{u)I^{t-u)du > ^ 


l<r<R, ' 
l<s<S 


\u-er\\<^'^\\t-u+(t>s\\<^du > 1 

l<r<R, 

l<s<S 


||t—(^r —0s)||^ 


Consequently, in Lemma 13.21 we have 


l<r<R, 

l<s<5 


q<Q 


a=0 l<r<R, 
l<s<S 


q<Q 


a=0 


and on writing {Ig * I^){a/q) in terms of its Fourier coefficients we hnd 

g—1 g—1 cxD cxD 

j2ii<i*i*)(Wq) = Yl E = 9 Y1 (F»F)W' 

a=0 a=0 k=—oo k=—oo, 

q\k 

Using also the fact that {Ig * I^){k) = Ig{k)I^{k), we conclude overall that 

CXD CO 

^ G(B,-4,,)<xY^k(q) h(k)T4k)<xY 

g^Q fc=—CO, k=—ooq<Q, 

l<s<S q\k q\k 

To hnish, dehne B := {—x < k < x : k ^ 0, > ^og^{‘^QdiS)}, and note that 

<i\k 

since Ig,I^ are supposed to vanish outside the interval [—a;,a;], and since we always have 
Yx<i<qA{(1) < Yjq<Q^id) ^ Q also \Tg{k)\ < Ig{t)dt < and \Bt,{k)\ < the 

q\k 
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right hand side above is 

< \og\2QRS)x Y, |f 0 (A;)|| 4 (A;)|+Qx|f 0 (O)|| 4 (O)|+Qx 5 ^|J,(fc)|| 4 (A;)| 


—x<k<x 


keB 


« \og‘{2QRS)x J2Wk)\\ Y.Mk)\^ + 


RSQ ^ RSQ 


X 


X 


k y fc 

The second term above is acceptable for Lemma I3.21 By Parseval’s Identity we have 




leitydtA / I^(tydt -C 


X 


, and so the hrst term above is also acceptable. Now since we always have — 

q\k 

J2g\k-^i^) ~ loS 1^1 provided 7 ^ 0 , there can be no elements of B with modnlns less than 
exp{log^(2Qi?S')}. Moreover, if exp{log^(2Q/2S')} < k < x and R{q) > log^(2Qi?S') 

q\k 

then we mnst be able to write k = nm, where n is Q smooth (i.e. all the prime factors of n 
are at most Q) and also n > exp{log^(2Qi?S')}. Therefore we have 

*B<2 Y 

exp{log^(2Q RS)}<n<x, 
nis Q smooth 

and standard upper bounds for the counting function of smooth numbers (see e.g. Theorem 
7.6 of Montgomery and Vaughan [19]) imply the right hand side is <C x/{QRSy^. It follows 
that ^^yB l/{QRSy < 1, which is certainly an acceptable contribution for Lemma 

K2\ □ 

Appendix B. Sketch proof of Lemma 14.11 

In this appendix we very briefly indicate how to deduce Lemma [4. II from Theorem 2.1 of 
Reinert and Rollin | 21 j . 

Indeed, the exchangeable pair construction and calculations used to deduce Central Limit 
Theorem 1, in an appendix of the preprint [9], transfer directly to this situation and imply 
that 

3 


^ E |Ci(“)l''|c<('>)PE|l/.|- + |ft|3^EVT I 5^ |c.(a)| 


2=1 


2£A 


\Eh{W)-Eh{Z)\ < \h\2 Y 

a,beA 

In Lemma [4.11 we assume the uniform fourth moment bound ElVl"^ < and so the 

hrst term is 

\h\2K^ 




m 




which is acceptable for Lemma [4.11 We also note that 

E|0“ < T1 + < ^ + ^EIKI* < 2^. 
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and therefore the second term above is 


as reqnired for Lemma 14.11 


■C 


\hUK^ ^ 

m3/2 


i=l 



□ 
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